Enriching Source for English-to-Urdu Machine Translation
نویسندگان
چکیده
This paper focuses on the generation of case markers for free word order languages that use case markers as phrasal clitics for marking the relationship between the dependentnoun and its head. The generation of such clitics becomes essential task especially when translating from fixed word order languages where syntactic relations are identified by the positions of the dependent-nouns. To address the problem of missing markers on source-side, artificial markers are added in source to improve alignments with its target counterparts. Up to 1 BLEU point increase is observed over the baseline on different test sets for English-to-Urdu.
منابع مشابه
Word-Order Issues in English-to-Urdu Statistical Machine Translation
We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experi...
متن کاملAGHAZ: An Expert System Based approach for the Translation of English to Urdu
–Machine Translation (MT ) of English text to its Urdu equivalent is a difficult challenge. Lot of attempts has been made, but a few limited solutions are provided till now. We present a direct approach, using an expert system to translate English text into its equivalent Urdu, using The Unicode Standard, Version 4.0 (ISBN 0-321-18578-1) Range: 0600–06FF. The expert system works with a knowledg...
متن کاملImproving Machine Translation via Triangulation and Transliteration
In this paper we improve Urdu→Hindi English machine translation through triangulation and transliteration. First we built an Urdu→Hindi SMT system by inducing triangulated and transliterated phrase-tables from Urdu–English and Hindi–English phrase translation models. We then use it to translate the Urdu part of the Urdu-English parallel data into Hindi, thus creating an artificial Hindi-English...
متن کاملAutomatic Learning of Morphological Variations for Handling Out-of-Vocabulary Terms in Urdu-English Machine Translation
We present an approach for online handling of Out-of-Vocabulary (OOV) terms in UrduEnglish MT. Since Urdu is morphologically richer than English, we expect a large portion of the OOV terms to be Urdu morphological variations that are irrelevant to English. We describe an approach to automatically learn English-irrelevant (targetirrelevant) Urdu (source) morphological variation rules from standa...
متن کاملModel for English-Urdu Statistical Machine Translation
There are above 60 million first language speakers of Urdu and above 104 million second language speakers. Lot of knowledge on the internet available/useful to these speakers of Urdu is in English. The contrast in typology of both languages is interesting to study for Statistical Machine Translation. However, there is almost no parallel aligned data available freely for the selected language pa...
متن کامل